Automata-guided Context-free parsing for punctuationless languages
نویسنده
چکیده
We propose a system for analyzing texts written in languages which don't make use of punctuation, with syntactic tagging in mind. The core system is a simple chart parser, but to cope with the complexity and ambiguity problems, we use simpliied nite-state automata, which guide the analysis. An application to Ancient Egyptian texts is introduced.
منابع مشابه
Parsing with Pictures
The development of elegant and practical algorithms for parsing context-free languages is one of the major accomplishments of 20 century Computer Science. These algorithms are presented in the literature using string rewriting systems or abstract machines like pushdown automata, but the resulting descriptions are unsatisfactory for several reasons. First, even a basic understanding of parsing a...
متن کاملLanguage Approximation With One-Counter Automata
We present a method for approximating context-free languages with one-counter automata. This approximation allows the reconstruction of parse trees of the original grammar. We identify a decidable superset of regular languages whose elements, i.e. languages, are recognized by one-counter automata.
متن کاملA Note on the Succinctness of Descriptions of Deterministic Languages
The result proved in this paper is that for the elements of some infinite class of deterministic context-free languages the size of deterministic pushdown amomata needed to describe them is not recursively bounded by the size of the smallest unambiguous context-free grammars that generate them. This is a quantitative explanation of the fact that some languages require large descriptions in term...
متن کاملBalanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata
The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck ...
متن کاملRestarting Automata with Auxiliary Symbols and Small Lookahead
We present a study on lookahead hierarchies for restarting automata with auxiliary symbols and small lookahead. In particular, we show that there are just two different classes of languages recognised by RRWW automata, through the restriction of lookahead size. We also show that the respective (left-) monotone restarting automaton models characterise the context-free languages and that the resp...
متن کامل